seo

Best Practices for URL Formatting (aka how does Amazon deal with multi-URLs for same pages?)

I have several questions about URL formatting.  I’ll start with some questions about Amazon, and then relate them to where I work every day.  But, I think the questions, and their answers, are broadly relevant to those with URL formatting questions.

1) How does Amazon get away with having multiple URLs for the same website pages?  Don’t bots penalize them for that?

Both of these URLs point to the exact same page, but they aren’t 301’d when you enter them:

http://www.amazon.com/books-used-books-textbooks

http://www.amazon.com/exec/obidos/tg/browse/-/283155

All of these URLs go to the same book page:

http://www.amazon.com/exec/obidos/ASIN/0439784549/

http://www.amazon.com/gp/product/0439784549

http://www.amazon.com/Harry-Potter-Half-Blood-Prince-Book/dp/0439784549

http://www.amazon.com/Harry-Potter-Loves-SEOMoz-and-RandFish/dp/0439784549

2) Why does www.amazon.com have a Google PageRank of 9 when you type www.amazon.com, and then 0 if you click on the main Amazon.com tab?  (I do understand that the URL itself changes, but still, I’d expect that Google would understand that.) 

3) This leads to my real question.  My day job is at JibJab.  We’re currently working on a site re-launch and in the process are working to clean up many of our URLs.  We have legacy issues that have led to some hideous URLs. 

This is one of our most famous JibJab Originals (This Land):

http://www.jibjab.com/originals/originals/jibjab/movieid/65

and this is a JokeBox (user uploaded/UGC) joke:

http://www.jibjab.com/jokebox/jokebox/jibjab/id/605584/jokeid/133547

We’re planning on cleaning up our URLs to be something like:

http://www.jibjab.com/originals/this_land

http://www.jibjab.com/133547

With regard to ‘This Land’, we think that URL is pretty good.  A user can see that it is in the ‘Originals’ section of our site, and that the name of the content piece is ‘This Land’

However, for the joke, we’d prefer to have the joke name with it as well.  Our engineers have pushed back saying that adding that text would be difficult, and instead, we’re including it in our H1 tag. 

But, our big question (at the end of this long post) is, should we have /jokebox inserted, ala

http://www.jibjab.com/jokebox/133547

This would obviously help to indicate the section, but can lead to hundreds of thousands of URLs with that extra subdirectory.  Is it OK to omit that, and will it add confusion for search engines, as we add other main categories over the coming months/years?  For instance, if we have 5 more main categories (ala Originals and JokeBox… assume we call them ‘NewCat1’, ‘NewCat2’…) would it be strange to search engines to use the main section name for all but JokeBox, or do they not decipher the site the way a human would?

Sorry if this isn’t perfectly clear, but for all the reading I’ve done about this, I’m still confused about where to lean… short URLs… or consistent tree structure… and that’s why I started with Amazon, because they seem to defy most all rules I’ve read (e.g. only 1 URL for a page, or 401 it…)

I hope this spurs some discussion — I look forward to the community’s input, 

Dave

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button